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ABSTRACT 


Trial-to-trial  changes  in  the  proportion  of  human  subjects 
predicting  the  occurrence  of  one  of  two  events  in  a  complex  sequence 
of  binary  events  (probability  learning)  are  analyzed  in  terms  of 
several  simple  models.  The  direction  of  change  predicted  by  linear- 
operator  reinforcement  models  (Estes,  Bush  and  Mosteller)  is  wrong  on 
about  75%  of  the  trials.  A  no-learning  model,  a  time-dependent  decay 
model,  and  a  cycle-dependent  decay  model  are  used  to  provide  some 
insight  into  the  nature  of  probability  learning. 

Some  suboptimal  procedures  for  estimating  parameters  of 
stochastic  processes  are  compared.  The  method  of  minimum  absolute  error 
is  recommended  as  being  very  useful.  ^ 
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Masanao  Toda 

A  couple  of  years  ago,  Professor  Mosteller  gave  a  presidential  address 
to  the  Psychometric  Society  entitled  "The  mystery  of  the  missing  corpus" 
(Mosteller,  1958).,  In  a  slightly  different  context,  I  sometimes  feel  about 
my  own  research  on  guess  process  that  I  am  trying  to  solve  a  like  mystery 
called  "The  Case  of  a  Deceptive  Beauty*"  But  unlike  Sherlock  Holmes  or 
Perry  Mason,  I  am  no  genius  as  a  detective.,  I  am  just  a  plain,  ordinary  man 
with  dogged  perseverence ,  and  I  have  just  succeeded  in  getting  a  confession 
from  my  suspect,  that  deceptive  beauty,  known  as  guess  process,  also  as 
probability  learning*  And  I  am  still  wondering,  whether  this  confession 
might  be  another  deception,  and  I  am  just  making  a  fool  of  myself  by 
triumphantly  talking  about  this  confession  Anyway,  the  confession  is  not 
yet  consistent,  and  I  am  not  yet  at  the  stage  of  getting  a  successful  trial « 
However,  here  is  one  thing  about  which  I  can  taj^c  with  complete 
confidence;  this  deceptive  beauty,  probability  learning,,  has  a  very 
complicated  character,  no  matter  how  plainly  simple  she  may  appear,  and  no 
matter  how  many  psychologists  are  honoring  her  simplicity  by  sonnets  in  the 
form  of  simple  stochastic  learning  theories 

My  plan  for  today's  talk  on  my  unfinished  detective  story  is  like  this 
First,  I  will  introduce  her  to  you  formally  with  appropriate  courtesy, 
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second,  I  will  tell  you  something  about  her  shadowy  inside  life  when  she  is 
out  of  sight,  and  then,  finally,  I  wili  ta*.k  something  about  ponce  science 
or  parameter  estimation 

I  think  most  of  you  are  familiar  with  guessing  experiment  .  or  two  armed 
bandit  experiments  So,  I  will  show  you  just  an  example  of  *he  experimer  ai 
procedure  Imagine  a  deck  of,  say,  piaying  cards  The  experimenter  shovo 
these  cards  one  by  one  from  the  top  of  the  deck  Now,  rhe  experimental  Ss 
task  is  to  predict  the  color  of  each  card  each  time  before  the  card  is 
shown.  This  is  iust  a  kind  of  game,  and  Ss  are  encouraged  to  maximize  the 
number  of  correct  predictions  That  s  a--  Suppose  the  ’otax  number  of  Ss 
is  N  Suppose  the  number  of  Ss  who  predicted  black"  on  triax  -c  is 
Then  I  call  n^/N  the  guessing  quotient  with  respect  tc  'bxack'  response  on 
trial  -c  By  plotting  these  guessing  quotients  on  ail  the  tna*s  we  obtain 
the  guessing  curve  You  will  see  exampxes  of  guessing  curves  in  Figures  1, 
2,  and  3,  Please  look  at  the  Figure  3  firs’-  The  shor-  lines  attached 
to  the  top  and  the  bottom  lines  of  che  graph  represent  the  arrangement  of 
the  cards  used  in  the  experiment  There  ace  twc  short  lines  attached  to  the 
leftmost  part  of  the  top  line,  which  are  then  followed  by  a  blank  And 
wherever  there  is  a  blank  on  the  top  line  you  wilx  find  a  short  line  on 
the  bottom  line  These  three  short  lines  then  mean  tha*  the  “i rs '  two  ards 
were  blue,  and  the  third  was  red  Sc  you  will  see  ’hat  the  arrangement  of 
the  cards,  or  the  sequence  of  events,  used  here  is  a  random  sequence  with 
probability  n(8)  »  „75  for  obtaining  blue  Now  the  ordinate  of  this  graph 
shows  the  values  of  guessing  quotients  with  respect  ~o  prediction  of  b^ue 
Three  groups  of  Ss  were  given  this  same  sequence*  So  the  »hree  points 
corresponding  to  the  first  trial  indicate  that  in  each  group  about  bO  cr 
601  of  the  Ss  predicted  blue  as  the  coxor  of  the  first  card 
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After  Ss  made  their  responses-  they  were  shown  the  first  card  wh-ch  was 
blue,  and  each  of  them  recorded  his  prediction  and  the  coicr  of  the  card  ip. 
the  answer  sheet,  and  then  they  proceeded  to  predicting  rhe  .cior  of  '•he 
ne  xt  card  u 

Now,  I  think  some  of  you  who  are  familiar  with  guessing  exper.ments 
might  be  puzzled  by  this  figure,.  Guessing  curves  you  find  m  psycho^cgica* 
journals  do  not  usually  look  like  this*  Usually,,  they  start  off  ar  about 
d»5f  and  smoothly  and  monotonicaLly  approach  a  certain  asymptote  But  here 
in  figure  3t  there  is  nothing  smooth  and  nothing  monotoni.. ,  This  reminds 
me  of  a  joke  Mona  Liza  had  a  +co*hache  and  Leonardo  had  an  ideal  mode  „ 
Now  you  can  pull  out  all  the  teeth  of  the  original  guessing  curves  like 
those  presented  in  Figure  3r  by  averaging  guessing  quotients  over  each 

block  of  ten  or  more  trials.  And  this  is  what  usual. y  is  lone  by 

psychological  artists,  o  dentists  and  as  a  result  we  get  mysrerious iy 
simple  and  smooth  guess x i  g  curves.  Averaging  across  blocks  or  ’na^s  is. 
of  course ,  a  completely  iegit ima*e  procedure  if  these  struct -res  of  the 
curves  are  just  outcomes  of  rardom  fluctuations  But  random  fluctuations 
cannot  be  reproduced  so  regularly  as  occurs  in  Figure  3  as  well  as  in 
Figures  1  and  2, 

However  1  could  hardly  do  jus*  ice  to  the  beauty  cf  he  aweragt-c 

guessing  curves  if  I  ^aid  jt  .s  a.:  d  e  *o  the  piastic  surgery  of  averag-nj 

There  is  a  mystery  something  beyond  fat.  and  it  is  the  i.,e  of  The 
asymptote  to  which  smooth  averaged  guessing  curve?  approach  As  rar  as 
experiment  is  onducTed  under  or  firary  conditions  i.e.  when  Ss  are  jl.st 
guessing  and  r,ot  making  money  :r  prcpor-icn  to  ’he  number  of  correc : 
predictions,,  rhe  asymptotic  value  ct  P'8  the  guessing  que-ienr  with 
respect  *c  the  even’  8,  is  almcs*  aiways  approximately  eq-3j  *c  t.  8 
no  matter  what  he  value  mB 


inis  erfect  is  called  probability  mat  :n  g 


This  result  has  puzzled  many  people  sirce  to  mar:h  response 
probability  with  event  probability  is  co  obviously  non  op  imal  if  Ss  ore 
maximizing  the  number  of  correct  predict: ops  ard  if  They  knew  ’he  evert 
sequence  is  random,  Suppose  ir(6>  >  *  r.  S!  Then  S  car.  get  ’  B'  as  The 
mean  number  of  correct  predictions  if  he  always  predicted  bite  But  :f  he 
matches  his  prediction  probability  with  r  61  then  hit-  mean  number  of  hits 
reduces  to  u*  *  H  «-  r  Equality  holds  cr^y  for  r  •  * 

The  reactions  ro  this  effect  among  psychologists  whe  were  interested 
in  this  process  were  not  unanimous  A  group  of  perp*e  mc_uding  myself 
were  rather  deeply  annoyed  by  this  apparent  irrationality  ard  attempted  to 
prove  either  or  both  of  the  following  two  hypotheses  1  Ss  we^e  no’ 
simply  maximizing  the  plain,  unweighted  *otai  number  of  correct  prediction?, 
(2)  Ss  were  not  perceiving  the  event  sequence  as  random  One  of  my  Ss  *c.d 
me  that  he  could  not  resist  the  temptation  cr  trying  *c  hit  the  ]ackpot  by 
predicting  a  very  infrequent  event  If  this  k  ind  of  ur.e  /en  utilities  for 
more  frequent  and  less  frequent  events  is  responsible  for  probability 
matching,  we  should  be  able  to  get  rid  of  probability  matching  by  inducing 
an  even  utility  distribution  by  means  of  paying  Ss  money  in  proportion  to 
the  correct  predictions  This  hypo’hesis  has  teen  very  well  ccnrirmed  by 
a  coupie  of  experiments  dene  by  different  people  Obviously  Ss  preferred 
real  pennies  to  imaginary  jackpots 


*PROOF  n  >  1  I  Put  ’ /?  ♦  t  •  r  0  f  ' /2  Then  we  have 
r'  ♦  ( ?  v  I  ^  *  [  1  /  2  *  e  I  ^  *  »/2  e  ) c  »  :  I?  r  ?(  L  for  the  mean  number 

of  hits  per  trial  under  the  probability  matching  strategy  On  ’be  other 
hand,  the  mean  number  of  hits  per  trial  order  the  pure  strategy  of 
predicting  8  all  the  time  is  n  -  1 /2  *•  c  The  .a*ter  is  greater  than  the 
former  since  2  ( tr  ( n2  ♦  (1  tj)'1!  =  1  »  2i  Jr  2f  '  2\  0 
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The  reaction  of  another  group  of  people  was  a  k.i.nd  of  aitistic 
inspiration  As  a  consequence  we  are  now  able  to  appreciate  a  couple 
of  mastet  pieces  of  mathematical  art 

The  greatest  of  all  is4  according  to  my  opinion;,  Estes  model 
(Estes,  i.959)s  since  he  uses  only  two  parameters  to  describe  guessing 
curves  I  am  using  wrong  words  He  calls  his  node^  a  theory  and  he  is 
not  describing,  but  predicting;  oirce  a  theory  should  predict,,  not  descrile 
You  may  wonder  if  it  is  possible  to  predict  withe  a*-  Jescribi'g  But  Estes 
did  it ,  and  I  will  shew  you  how  this  stunt  was  dcre  His  basic  assumption 
will  be  stated  like  this 

p .  .  ~  \p  *  a  ■  i  1  A 1  0  «-  A 

His  original  expression  (Estes,  195C)  is  different  from  this f  but  these  are 

equivalent  Now  p  is  the  probability  of  predicting  a  specified  event  on 

trial  -c.,  is  a  function  taking  the  va*ue  ’  or  0  according  as  the 

specified  event  has  occurred  cr  rot  on  t^ial  c  respectively  In  psycho  .cgy 

this  type  of  theory  belongs  to  a  class  of  reinforcement  'henries,  since  the 

event  obtained  on  trial  -t  reinforces  the  response  oriented  to  that 
particular  event  A  is  a  parameter  and  anctner  parameter  in  this  me  del 
is  obviously  p; 

New  rhis  equation  has  a  form  directly  applicable  *0  individual  guessing 
quotients  and  it  should  be  true  if  a  re .r f or  cement  Theory  of  this  typ3  ? 
to  have  any  validity  at  a!-,  that  individual  go-ess ing  quotient  in  genera 
increases  when  the  specified  response  is  reinforced  and  do-  'eases  when  the 
alternaTive  response  s  reinforced  I  tasted  this  assumption  with  my  da  a 
and  the  assumption  was  confirmed  only  in  ab  lut  ?5 %  of  *he  wh.  e  set  or  ?0C 
"rials  Now  Estes  did  ro*  use  ^ -dividual  guessing  quotients  but  or  ^y  the^r 
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averages  So*  in  this  equation,  is  aiso  replaced  by  its  average  rr  its 
expectation  r  Once  this  was  done*  it  is  really  easy  to  obtain  an  explicit 
form  for  p  . 


P++1  *  "  St*  ‘  Pf  * 


Since  0  «•  X  <  J ,  now  we  have 


tan  p  »  ir„ 


Thus  the  probability  matching  effect  :s  predicted  even  ’’hough  its 
prediction  concerning  the  direction  of  change  of  guessing  quotient  is 
wrong  751  of  the  time. 

Parsimony  in  the  number  of  necessary  parameters  is  certainly  a  virtue 
in  a  good  theory  But,  according  to  an  oriental  belief,  a  virtue  is 
something  hard  to  obtain  So,  I  like  to  take  a  hard  way  ,  starting  with  a 
purely  descriptive  model  which  has  as  many  parameters  as  possible  to  rake 
care  cf  various  information  involved  in  the  data  The  number  of  parameters 
may  then  be  reduced  if  one  is  xucky  enough  to  find  that  some  of  the 
parameters  are  redundant 

I  should  say  that  this  had  been  my  belief  before  I  got  into  the  presc-n* 
problem  Then ,  I  found  out  that  I  was  *oc  optimistic  If  I  use  a  mode^ 
with  too  many  parameters*  I  would  simply  be  s'uck  with  ~he  impossibility  or 
parameter  estimation,  and  furthermore  there  is  no  purely  descriptive  mode 
A  model  becomes  a  *heory  once  the  model  is  applied  to  real  da* a  Sc  there 
is  always  a  danger  in  using  a  single  model  for  the  purpose  cf  analysis  of 
da*a,  even  if  the  model  is  primarily  oriented  toward  a  description  Tha ' 
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much  was  the  lesson  I  obtained  from  my  frustrating  experience  of  trial  and 
errors  and  I  am  now  just,  hcpirg  that  .  arTer  hearing  my  experience 9  some  of 
you  could  teli.  me  if  there  is  a  better  strategy 

Now  *et  me  get  ba.k  to  the  data*  The  three  sequences  used  m  m>  first 
experiment  the  results  of  which  are  shown  in  Figures  i.  I and  are 
named  "long  run  sequence,"  ’medium  run  sequence'  and  "uneven  probability 
sequence/'  The  long  run  sequence  is  characterized  by  r'X1  *  C  50  and 
also  by  the  conditional  probability  *  70  X  -  "Blue1  and  ‘Red"),, 

The  medium  run  sequence  is  charac tenzed  by  nlXj  ■»  O-bO  and  n^'X!  -  J7 
The  third  sequence  is  characterized  by  *  iig'Bf  *  75  and  nfR!  *■ 

*R(R)  *  025 

Now  all  the  three  sets  of  guessing  curves  shown  in  Figures  1}  2  and 
3j  have  definite  but  different  structures,,  If  I  want:  to  say  anything  more 
specific,  however  1  need  a  descriptive  model*  And  at  ’■hat  stage  of  my 
research s  there  was  none,  Even  the  most  well  formed  des*:nprive  mode*  of 
learning,  the  Bush  Hosteller  model  'Bush  fe  Mosteller  .1955),  has  too 
strong  a  set  of  assumptions  to  be  applicable  to  these  structures 

This  much  seemed  to  be  obvious  If  a  sequence  of  responses  had  a 
structure,,  and  if  the  stru.ture  was  different  for  different  event  sequences, 
then  the  structure  cf  response  sequences  should  somehow  correspond  to  th* 
structure  cf  event  sequences  This  hypothesis  was  easy  *o  check 
particularly  as  I  had  an  impression  that  Ss  were  responding  principal  *c 
run  length,  Although  "his  could  not  be  the  cniy  facto'  responsible  for 
the  response  structure  I  decided  to  emp  cy  a  simple  pirot  mode^  which 
while  it  was  very  pocr  in  its  descriptive  capacity  and  ]ust  absurd  as  a 
theory;  had  the  virtue  of  giving  no  trouble  m  estimating  its  parameters 
ar,d  couid  serve  to  test  my  hypothesis  about  run  length* 
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The  observed  guessinp  curves  for  the  first  100  trials  of 
the  lonp-run  sequence  and  the  correspond inp  predicted  hv 
No- 1, earn  in?  “odel . 
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TRIALS 


Tig.  8  Observed  and  predicted  guessing  quotients  for  the  first 
100  trials  of  short-run  sequence,  (No-Learning  *odel). 


Observed 
T  heoretical 


rig. 


Observed  and  predicted  guessinp  Quotients  f'-r  the  second 
100  trials  of  short-run  sequence,  ( Vo-'.earn  inp  Model). 


\ 


New,  my  first  pilot  model  may  be  cabled  no  leal <JU.ng  nun  dependent 
model  Its  basic  set  of  assumptions  is  as  fellows  The  length  of  run  of 
the  sane  evert  which  Ss  have  just  observed  :s  the  only  factor  that 
determines  their  response  probability  This  run  *eng*h  dependent  response 
probability  is  assumed  constant  throughout  the  course  cf  the  experiment- 
That  is,  there  is  absolutely  nc  learning  Let  me  give  you  an  example  ard 
define  the  notation  I  am  goirg  to  use  Let  the  sequence  of  events  be 
like  tms 


run  class 

* 

2 

3 

u 

5 


oooxxooooxoo 


±2  3u56'709Oi2 


1 


2  3 


••  -  -events 

-  trials 


? 

1 


\ 

) 


cycles  within  each 
run  class 


Any  VujoJL  or  which  S  is  in  the  state  of  just  having  observed  a  run  of 
length  n  is  said  to  belong  to  the  nun  zldoi  n  A  serial  number  is 
attached  to  each  trial  belonging  tc  -he  same  run  mass  in  the  order  of  its 
appearance  in  the  whole  sequence,  ard  is  called  the  cycle  nuwh CA  of  -he 
tnai  within  the  run  class  P  <n'  deno-es  -he  prcpor-ion  of  Ss  t  guessing 
quotient)  who  predicted  on  trial  <  the  oarne  event  obtained  on  tna.  4. 
ard  n  ir.dica*es  that  trial  i  belongs  to  rur  class  n  In  general  p^lrt' 
is  used  to  denote  the  theoretical  prediction  for  p^n)  Now,  wha  the 

rc  learning  model  amour.-s  to  is  that  p  n  *  c  wrer-e  c  is  a  cons'ant  for 

\  n  K 

ea  h  n  independent  of  trial  number  <. 
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r  (length  of  the  just  preceding  run) 


rip.  10  Values  of  the  estimated  parameters  used  in  the 

No-Learninr  Model  fit.  P  rt(K)  is  the  probability 
of  predict  inr  th*>  same  evert  as  the  preceding, 
following  i  run  of  length  r. 


For  "he  purpose  cf  testing  this  model  I  conducted  the  second 
e*per  tent  in  which  the  long  run  sequence  and  the  medium  run  sequence  are 
extended  to  200  -rials,  and  a  short -run  sequence  chararoenied  by 

x  SO  and  -^IX*  *  .25  is  added  to  them  The  results  are  show,  in 
F.g.re^  -  through  9  along  with  the  correspcnamg  nc  .earning  mode. 

P  r  d  i  *  -  .  on  s 

Ct  icus.y;  this  assump-ion  of  no-leaming  is  absurd  Sc  {  to  keep 
a-  ?>e  on  l-s  absurdity ,  1  estimated  the  parameters  separately  for  the 
r  .rtT  .OC  trials  ard  the  second  100  trials  of  each  sequence  The 
esT.ma*ed  parameter  values  are  plotted  in  Figure  10.  From  these  figures 
.  is  a:,  too  clear  that  this  absurd  model  worked  very  well. 

Tc  s  _m  up  the  conclusions  drawn  from  this  pilot  analysis  First  Ss 
d.  ctererc ca.ly  respond  to  different  lengths  of  preceding  rur.  Secondly 
there  -*.a  a  .earning  effect  as  seen  in  Figure  10  and  this  effect  is  most 
pronoun  red  ir  long  run  sequence  and  very  little  in  short-run  sequence 

Now  let  me  proceed  to  my  next  pilot  model,  which  now  contains  an 
t  emen-  cf  .earning,  sc  that  In)  is  no  longer  constant  And  this  model 
W ll.  he  called  Pecay  M odel  I  or  tone  dependent  decay  model  The  exact 
des:r  p  ion  of  this  model  is  given  as  follows 

DECAY  MODEL  I  TIME  DEPENDENT  DECAY  MODEL 
p  ‘rt  probability  of  predicting  the  same  event  as  occurred  on  the  just 

■V 

preceding  trial  This  probability  depends  upon  which  run  class 
n  the  trial  <  belongs  to 

u  rt  response  weight  for  predicting  the  same  event  as  occurred  on  -he 

4. 

just  preceding  trial 

-espense  we.ght  for  predicting  the  cppcsi'e  event  to  the  ore 
occurred  on  the  311st  preceding  trial 


\  a  parameter,  0  «  X  <  f 

u  a  parameter.  0  *  u 

v  a  parameter,  0  *  v 

o^(n)  *  ?  if  -c  belongs  to  run  class  *t  and  the  same  event  occurs  on  trial  < 

•  0  otherwise 

S^lnJ  *  1  if  4.  belongs  tc  run  class  n  and  the  opposite  event  occurs  on 
trial  i 

•  0  otherwise 

The  following  system  of  equations  holds  ror  each  value  of 


(1) 

p^lnl  » 

u^Inl/uMnl 

(2) 

tc  ( n )  * 

u  Ini  ♦  v  In! 

4, 

4.  4. 

(3) 

*  Au  In!  ♦  a  In' 

t  <. 

(4) 

*  Av  (nl  ♦  3  ‘n! 
4.  4. 

As  obvious  from  equations  (1)  through  the  trial-by-triai  change 

in  response  tendency  is  not  directly  described  in  terms  of  p  as  it  is  in 
the  Bush-Mosteller  or  Estes  models  but  it  is  described  in  terns  of 
response  weights  u  and  v,.  and  response  probability  p  is  given  by  normalizing 
u  with  respect  tc  -he  total  weight  u/ 

This  type  of  model  is  often  called  a  non  linear  model,  but  I  wculd 
rather  like  to  call  it  a  quasi  linear  model,  since  it  has  many 
characteristics  in  common  with  linear  models 

New  let  me  explain  about  *his  Decay  Model  I  characterized  by  Eqc  (*’ 
through  lu)  Take  a  trial  <  for  example  The  Tiai  may  be  preceded  oy  a 
run  of  length  n  so  that  it  will  belong  to  run  class  n  Suppose  that  tie 
maximum  run  length  appearing  m  the  "cta_  event  sequence  if  m  Then  the 
model  assumes  that  a*  least  m  pairs  of  response  weights  tt  and  u 
potentially  exist  among  which  only  such  pair  which  *:rresponds  to 
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*l;  u(n)  and  win),  determines  p^#  the  probability  of  predicting  on  trial 
the  same  event  as  occurred  on  trial  <-  ?  Now,  suppose  that  the  same 
event  obtained  again  on  the  trial  i,  Then  these  pairs  of  response 
weights  changes  on  the  next  trial  in  such  a  way  that 

f  •  au(  M 

I  4. 

/  win.  ,  *  avim 

4,+  l  4. 

■ 

lvln,i.i  ' 

raU.'|.„ 


Au(n)^  ♦  u 
Av(«) ^ 

»  ulnl 

4, 

•  win) 

4 . 


That  is,  all  the  weights  except  u(n)  decrease  by  constant  fraction  A , 
ard  u'nl  ordinarily  increases  This  change  in  response  weights  is 
reflected  in  pin)  in  such  a  way  that 


rt.,rn  .  PJM 
p.„m  .  p4m 
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Run  Class  3 


p^1  (w)  /  PJH' 

(«♦!!  --  in* 1 . 

Sc  all  the  pin)  except  one  remain  unchanged  on  each  trial  The  ccvert 
process  oh  constant  decay  of  response  weights,  however ,  has  the  following 
effect  As  the  interval  between  two  successive  cycles  within  the  same  run 
class  ’'ends  longer,  the  impact  of  additional  cone-art  u  or  v  upon  the 
resultant  weight  becomes  greater  and  therefore  the  accompanying  change 
in  pin)  becomes  also  greater 

Now  let  me  give  you  just  an  intuitive  interpretation  of  this  model 
Suppose  that  S  is  classifying  information  given  by  each  observation  of 
event  according  to  the  length  of  th  •>  just  preceding  run.  Suppose  that 
u  In)  and  v  in)  can  be  interpret-.-:!  as  the  subjectively  evaluaTed  amounts 
of  evidences  respectively  supporting  *he  predictions  "same'  and  "opposite" 
'orresponding  to  the  category  n  Then  this  model  means  that  S  is  employing 
a  s-rategy  for  information  book-keeping  such  that  -he  whole  stock  of 
evidences  is  depreciated  by  a  constant  frarticn  A  each  time  he  proceeds  one 
tn a  forward  Obviously,-  this  strategy  has  a  certain  sense  in  view  or 
adapt  at  ion 

I  did  not  use  al}  the  data  for  the  purpose  of  -esting  this  Decay  Model  I 
but  used  the  first  50  trials  of  the  long  run  sequence  and  *he  first  50 
trials  of  the  medium- run  sequence  These  were  the  trials  on  which  the 
data  of  the  firs*  experiment  and  the  second  experiment  could  be  pooxed 
Since  this  Decay  Model  1  was  another  piio-  model  I  wanted  first  to  try  out 
the  model  with  the  most  precise  part  of  the  daT  a  for  the  same  reason  I  did 
re’  ure  precise,  but  time  consuming,-  method  for  parameter  estimation,  but 
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attempted  to  find  plausible  looking  parameter  values  through  tria*  ana 
error  The  result  of  the  fitting  is  shown  m  Figure  11  The  parameter 
values  used  here  as  follows 

1  •  926  ur(n)  *10  0  (for  n  \  2  and  3» 

*  5  0  fvtlM  5  0),  u?t2<  *  2  5.  (v?(2)  *  7  51 

u^(  3)  -  3  0  (v?(3»  »  7  0\,  u  •-  V4  and'  v  •  1 1 

(Those  parameter  values  given  in  parentheses  are  derived  from  others 
v  •  1  is  chosen  since  we  can  choose  the  unit  of  weigh*  arbitrarily  )  Now, 
in  Figure  11.  horizontal  broken  lines  represent  corresponding  nc- learning 
model  predictions  This  nc* learning  mode  i  fi*  uses  six  parameters  and 
Decay  Model  I  also  uses  six  independent  parameters  As  yoa  see,  a 
considerable  improvement  of  data  description  has  been  made  by  moving  from 
the  first  pilot  model  to  the  secord 

Having  been  encouraged  by  this  success  I  tried  to  fit  this  model 
to  the  remaining  part  of  the  data  The  result  was  that  the  fit  was  by 
and  large  worse  than  in  the  no- learn  irg..nc  del  There  may  be  two  possible 

alternative  interpretations  of  this  result  One  is  that  rhe  success  of 
Decay  Model  I  on  the  first  50  trials  is  ar  artifact  and  *he  o*her  is 
that  seme  charge  in S s  response  structure  takes  place  at  abo-*  SOtn  tr-a*. 

I  am  now  inclined  to  believe  the  second  possibility  for  a  couple  of  re&sons 

but  I  will  nor  go  into  that  issue  now 

Because  of  this  partial  success  of  the  time  dependent  decay  mode*.  I 
wanted  to  try  another  type  of  decay  model  which  nay  be  ca  led  " Cycle. 
dependent  Pe.au/  Model  "  or  simply  Vzcay  Model  II  The  mcdei  is  ny  third 
pilot  mode.*  and  its  formal  description  is  as  follows 
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DECAY  MODEL  II  CYCLE- DEPENDENT  DECAY  MODEL 


In  the  Decay  Model  II ,  the  meaning  of  subscript  -i  of  variables  is  so 
changed  that  it  new  indicates  the  cycle  number  of  the  run  class  n  instead 
of  serial  trial  numoer  as  it  is  in  Decay  Model  I. 

X  a  parameter,  0  <  X 

e  a  parameter,  -I  <  c  <  / 

a^(rt)  *  1  if  the  same  event  as  that  on  the  just  preceding  trial  occurs  on 
cycle  i.  of  run  class  n 

*  0  otherwise 

3^<n)  *  1  if  the  opposite  event  to  that  on  the  just  preceding  trial  occurs 
on  cycle  i.  of  run  class  n 

•  0  otherwise 

The  following  system  of  equations  holds  for  each  value  of  n. 


(5) 

p4<*I  ■ 

u^UI/ur  lnl 

(6) 

nrliil  • 

u^(n)  ♦  v^lnl 

cm 

“i.l1"1 

»  ♦  ('♦tla^lnl 

^e> 

Viul 

■  Aw^ln)  ♦  (’♦elB^lnl 

New.  the  major  difference  of  this  new  decay  model  from  the  first  one 
is  bat  response  weights  do  not  decay  on  each  trial,  but  decay  only  on  each 
cycle  belonging  to  the  same  run  class  Then  this  is  certainly  a  simpler 
model  than  the  first.  Another  difference  is  a  minor  modification  of 
notation 
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HE  SAME  EVENT 
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li»7)  (202)  13)  (16)  (37)  TBiAlS  (60)  (66)  (104)  (133) 
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096)  (203) 


1?  The  exact  fit  of  Decay  ’todel  II  to  P*  for  n  •  I  and  1 
of  the  lon*-rur.  sequence. 
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Tip.  13  Cxact  fit  of  "'-•cay  Model  II  to  P*  for  n 
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Tip..  IS  Cxact 

fit  of  Decay  Model  II  to  P*  for  n  • 

short-run 

sequence. 

GUESSING  QUOTIENT 


Short-Run  Sequence:  Run  Class  I  (Cont. ) 
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Before  entering  into  the  application  of  this  model,  I  should  mention 
another  type  of  analysis  I  dido  All  the  mathematical  models  so  far  applied 
to  guess  process  including  my  own  assume  that  response  probabilities  are 
affected  only  by  physical  events  and  not  by  Ss'  own  responses 0  But  it  is 
psychologists  common  sense  that  responses  are  affected  by  previous 
responses  too,  The  existence  of  this  effect  in  guess  processes  was  first 
demonstrated  by  HaVe  and  Hyman  (1953) u 

As  a  matter  of  fact  the  effect  of  success  and  failure  on  preceding 
trials  upon  the  response  found  in  my  data  is  really  complicated.,  The  nature 
and  amount  of  effect  differs  from  sequence  to  sequence  and  from  run  class 
to  run  class  o  Furthermore  -  the  effect  does  not  disappear  even  at  the  end 
of  200  trials.  And  a  trouble  with  linear  and  quasi-linear  models  is  that 
they  are  very  ngxd  about  their  probability  matching  property  and  their 
descriptive  capacity  tends  to  be  poorer  and  poorer  as  trial  proceeds «  So 
after  finding  this  effect  t  I  was  again  forced  to  use  partial  data*  I 
recalculated  guessing  quotients  on  each  trial  for  only  those  Ss  whose 
prediction  on  the  jus"  preceding  trial  was  success,  and  denoted  them 
as  P* o  Analogously,  1  caiculated  P  for  those  Ss  who  failed  on  the  just 
preceding  trial.,  These  P*  and  P  are  plotted  in  Figures  12  through  16  for 
the  three  sequences  and  *he  corresponding  run  classes  1  and  2„  And  I 
applied  the  Decay  Model  II  only  to  P*f  since  by  and  large  V*  is  more 
reliable  than  P  0  I  am  making  full  use  of  the  excuse  that  I  am  dealing 
with  pilot  models.,)  The  results  of  the  fit  of  Decay  Model  II  to  P*  are  also 
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plotted  in  the  same  figures „  The  parameter  values  used  are  as  follows? 

A  *  0c942c  -  5P  tot  { 2 )  *  0S  UjP)  *  2  5  and  e  *  0  (Uj<2)  is 

automatically  0  since  u>j(2)  *  0  i  Here  e  *■  0  actually  means  that  I  gave 
up  using  e,  and  therefore  s  that  e  is  dropped  from  the  model,.  So,  the  real 
number  of  parameters  I  used  to  fit  the  model  to  the  partial  data  used 
if  four,,  Taking  into  account  this  small  number  of  parameters  used  I  could 
say  that  the  fit  of  the  model  to  the  data  is  moderately  good,  although  the 
fit  to  the  first  SO  trials  is  worse  than  that  of  Decay  Model  I„ 

Anyway,  I  think  I  have  definitely  demonstrated  one  thing  through  the 
applications  of  these  pilot  models,  guessing  processes  are  by  no  means 
simple o  The  apparent  simplicity  of  averaged  guessing  curves  is  a  complete 
deception  Meanwhile,  I  still  have  a  hope  that  someday  I  will  be  able  to 
solve  this  mysterious  case. 

How,  let  me  shift  to  my  second  theme  of  the  present  taik0  So  far 
it  has  been  a  detective  story,,  From  now  on,  it  will  be  a  speech  on  police 
science.  That  is,  I  want  to  talk  about  the  parameter  estimation  of 
Decay  Model  II . 

I  think  the  most  valuable  information  I  obtained  through  the  course 
of  parameter  estimation  are  not  the  final  outcomes  of  the  parameter 
estimation  but  the  things  I  learned  through  the  course  of  estimation  In 
books  on  statistics  we  find  how  optimal  procedures  of  parameter  estimation 
are  to  be  carried  out.  e „g,  ,  how  one  can  use  the  maximum  likelihood  method 
or  the  method  of  least  square „  But  I  can  find  nowhere  what  are  the  next 
bests  when  the  bests  are  not  practicable  ,  In  Bush  and  Moste^ier  s  book  k  tl 
authors  point  out  that  these  best  methods  can  be  applied  t r  linear  models 
only  in  very  special  cases0  If  they  are  impracti cable  for  linear  models, 
how  much  worse  for  quasi-linear  models.  So  what  1  have  done  first  was  to 
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A  *  “  is  obtained  bv  the  -'ethod  r>r  simnle  su"  if  the 
experiment  is  terminated  at  the  trial  »i  .  Put  the  sane 
method  rives  smaller  value  of  as  -oro^lata  a.*e  used  for 
the  estimation.  In  this  ripurel  8  is  reduced  tc  m 
if  the  terminal  trial  is  i:  . 
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learn  how  ma*hemat:i  cal  psychologists  estimate  their  parameters  What  I 
discovered  was  awful  I  tentatively  named  one  cf  the  most  popular  methods 
they  used  "'the  method  of  simple  sum,"  which  may  be  described  as  follows 
For  -he  sake  of  simplicity  let  us  consider  a  single  parameter  model 
for  a  psychological  process  The  model  gives  a  sequence  of  functions 
1$  ( A)  1 6)  9  ,  |$ ,  1 3'  where  <  is  the  trial  number  and  6  is  the 

parameter  On  the  other  hand  there  is  a  sequence  of  data  values  X  r  X2>  on 
x  Since  nobody  can  hope  trat  +  he  model  completely  fits  the  data,  we 

should  expect  a  devidtion  between  model  and  the  data  on  each  trial*,  Let  me 
denote  the  deviation  6^  and  ca  1  it  the  error  on  trial  -t„  Then  we  have  a 
system  of  equators 


X  *  6^(e!  i  *  1  2 , 

Now  if  we  sum  each  side  of  the  equations  over  all  the  trial  number,  we 
obt  ain 

X  l  <0)  •  J  6  (6) 

-  -t  L  ux  L  v 

If  we  estimate  0  b>  putting  -he  righ*  hand  side  of  this  equation  zero,  we 
have  the  method  of  simple  sum  Now  what  -his  method  of  simple  sum  really 
amounts  to  is  *o  ma^e  total  posi-ive  errors  and  total  negative  errors  be 
balanced  And  some  examples  will  easily  show  you  how  wrong  a  conclusion 
ore  nugbt  be  ied  to  order  certain  rather  common  circumstances  Take,  for 
example  such  a  single  parameter  model  as 
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<„le)  -  l  -  e'"6 

For  various  values  of  8,  we  obtain  a  family  of  theoretical  curves  as  shown 
in  Figure  17 .  Now  suppose  that  we  obtained  data  which,  although  increasing 
monotonically  #  has  an  asymptote  less  then  L  Then  the  absurd  result  we 
obtain  is  that,  the  more  the  number  of  trials  the  experimenter  runs  the 
less  the  estimated  0  obtained  by  the  method  of  simple  sum  as  illustrated  in 
Figure  18.  One  may  easily  find  this  kind  of  absurd  theoretical  curve  in 
psychological  journals.  A  more  dramatic  but  more  artificial  example  will 
be  given  as  follows : 

Consider  the  following  model. 

6n(e)  *  J  -  e~w6p  n  •  lc  2,  m 

•  n  ,  1 1  m+2f  2m 

Consider  an  extreme  case  such  that  the  obtained  data  exactly  follow 
the  model : 


r  1  *  e  nQ°r  n  *  ?f  2?  o  ,p  m 
a  e  In  m)  eQ  n  *  ^  2m 


Any  reasonable  parameter  estimation  procedure  should  give  9  *  6^, 

where  8  is  the  estimated  9.  But  the  method  of  simple  sum  can  give  no 

estimate  of  0,  since  any  value  of  6  satisfies  the  equation  l  6  ( 9)  *■  0 

i  4- 

Suppose  now  that  we  had  one  more  value  or  trial  2m*1  which  is  not  c-qaai  to 
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(m* 1 )  0 


0  bet  equai  to  £ 


'm-f  J  »  e 


,  where  0O  >  6  The-:  the  method  of  s;mple 


sum  g.  vp-  the.  estimate  b  0  That  :s  *he  estimate  is  determined  ]ust  by 
a  siTig  e  irregular  value „  Doe  may  wonder  who  would  use  such  an  obviously 
absurd  method*,  '’he  fa:’  is  *hat  th;s  is  ore  of  “be  most  popular  parameter 
estimation  methods  wher  the  best  methods  are  impracticable 

This  method  of  simple  sun  appears  under  various  disguises  when  the 
number  of  parameters  is  mot*  ?han  one  Whether  it  is  used  cr  not  can  easily 
be  checked,  however  by  seeing  if  -he  equation  used  to  estimate  a  parameter 
is  equina  trt  ro  putting  unweighted  sum  of  errors  equai  to  0 

New  ifrer  making  this  awful  disc every  I  ar tempted  to  obtain  a  set 
of  criteria  for  "he  admissibility  of  subept  mai  methods  of  parameters 
estimation  To  do  ♦his  I  chose  the  method  cf  leas*  square  as  the  ideal v 
the  ;  lost’'  o  \t  is  the  better  since  the  maximum  likelihood  method  is 
usually  further  <-.way  in  its  p recti  ability 

you  know  fbe  method  of  leas*  square  minimizes  the  sum  of  square 
deviations  Therefore  ,  in  cur  no"a*icr,  the  parameter  estimation  equation 
is  expressed  as 
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6  id'  n  j  o1  r  6  <6'  c  lei  *■ 0 
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[e  me  a'.  ^  ‘9  *be  C /inx>n.  cf  "he  method  of  least  square 

to  me  *ha  mos*  methods  of  parameter  est  „ma*  :cr  have  *heir 


I  ;eemt 


characteristic  error  functions  of  the  form' 


(0.(0)  6  (0) 

<.  JL 


and  the  estimation  of  parameter  6  is  made  by  putting  the  function  rqua^  to 
zero.  And  I  think  the  inherent  nature  of  a  method  of  parameter  estimation 


is  best  demonstrated  by  the  sequence 
least  square,  ur(0!  *  ^  ^(0),,  and 


of  weights  W^(6)  Fcr  the  method  of 
for  the  method  of  simple  turn,,  u)  (S'.  * 


for  all  i 


Now,  from  the  nature  of  the  least  square  method  error  function,  I 
derived  two  criteria  for  the  admissibility  of  suboptimai  methods  The 
first  one  corresponds  to  the  absolute  value  of  ^  (6);  The  absolute  value 

of  weight  of  an  admissible  method  should  be  great  for  such  e  where  the 
prediction  ($  (9)  is  relatively  sensitive  to  the  variation  of  ev  and 
it  should  be  small  where  |J  ( 6  is  relatively  insensitive  New  you  see 
that  the  absurdity  of  the  method  of  simple  sum  demonstrated  by  my  firsT 
example  is  due  to  its  failure  to  comply  with  this  criterion 

The  next  criterion  for  the  admissibility  is  concerned  with  the  sign 
of  |J  '(0);  the  sigr  of  admissible  weight  (V  (o'  should  be  differert  when 
if  :(0)  is  positive  and  when  jf  (0'  is  negative.  If  a  method  satisfies 
this  criterion  the  kind  cf  absurdity  I  have  shewn  in  the  second  example, 
could  never  happen. 

With  ♦hese  two  criteria  I  a*tempted  ;o  cbra:n  admissible  me -hod  fcr 
estimating  parameters  of  my  Decay  Model  II,  J  picked  up  a  for  the 
parameter  to  be  estimated  first,  since  p  f  a  tv  u  t*  becomes  almost 
independent  of  all  the  parameters  c*her  than  a  for  -arge  v  Then 
immediately  I  found  a  difficulty.  There  was  no  practicable  and  adfnc&i><ble. 


'iS  ■ 


method  for  estimating  The  only  practicable  method  I  hound  may  be  called 
the  method  0|$  4-jnple  'Ulti.Qo  'You  see  I  hate  everything  that  is  simple,,) 
This  method  is  almost  as  popular  as  the  method  of  simple  sum*  and  is 
slightly  better  Thar  the  la~*er  since  the  former  satisfies  the  second 
criterion.  Bn*  :  '  do*s  ret  satisfy  the.  first  criterion  of  admissibility  , 
Ler  me  brief  .y  describe  This  method  since  I  was  bound  to  use  it. 

Error  5  :s  defined  as  before 


6  v  p 
a  e 


According  tc  the  definition  of  Decay  Model  II 


Define  tmple  Aotto  k  in  such  a  way  as 


(K  '  r  *  "  V>  *  "  *1  ♦  £*> 
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.  6  .  1  / 1  x  6  * 
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if  a  *  ? 

c 


if  a  «■  d„ 

A. 


Now  by  substituting  p^  and  p^  ,  in  the  definition  of  k ^  by  the  above 
expression  oi  P^  ,  >  and  the  analogous  e  repress. cr  ot  p^  we  obtain 


k  -04!  <  f  mV  >  1 

A  i  A 


*rrespec  n’e  of  fne  a  ue  or  c 
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Now  from  the  definitions  of  Decay  Model  II  we  can  easily  derive 


M>.  *  u>  +  [1  A*) /(I  -  A)  »  VP  -  A)  *  OA  A*'*) 


4-1 


So,  by  substituting  this  into  fe.p  we  obtain 


*  A  ♦ 


3o#  by  assuming  b^  *  A  for  large  ir  we  obrain 


(I  -  -  AC  '  Xj  -  •  6^,  *  A«a,  if  «  --  I 


X-  ,  -  Ax  -  *  6  .  ,  -  A6 
4+1  4  4,*  I  4. 


if  a  *  0 
4. 


We  obtain  either  one  of  the  two  types  of  equations  for  each  large  4.  so 
that  by  taking  sum  of  each  side  of  these  equations  across  large  k  and 
putting  the  right  hand  side  which  is  the  error  function  cf  this  method, 
equal  to  zero,  we  can  estimate  X, 

Now,  let  us  take  a  look  at  the  e^ror  function  of  4 his  method.  From 
the  above  equations  you  can  easily  see  that  the  weights  of  the  error  function 
have  the  following  form 

ic  *  t  ! 1  X  i  i  f  a  ,  *  a 

4  *  4*  I  4. 

-  t  (1  +  A i  if  a  .  j*  a 

4*1  4 


uo 


So  each  weight  can  take  orly  one  of  the  two  vaiues f  and  the  difference 
between  these  two  values  is  great  since  *he  estimated  X  is  fairly  close  to 
T0  Sc  in  estimating  A,  what  this  method  is  actual!}  doing  is  taking  into 
account  only  those  data  on  the  trials  where  the  theoretical  curve  inflectSo 
And  the  worst  aspect  of  this  is  that  those  are  the  trials  where a  by  and 
large,  p^  is  most  insensitive  to  the  variations  of  X„ 

Anyway  this  is  what  I  have  done  for  estimating  A,  and  the  estimated 
As  are  fairly  similar  for  u  and  u  for  different  sequences  and  fer 
different  run.  classes.  However  a  slight  increasing  tendency  with  trial  is 
observed  in  the  estimated  value  of  A„ 

Crce  A  is  estimated,  the  nest  problem  is  the  simultaneous  estimation 
of  the  remaining  three  parameters  u  lU  and  e.  But  since  this  appeared 
to  me  technically  impossible,,  I  first  dropped  t  from  the  model  by  assuming 
s  *  0  Then  it  is  possible,  at  least  in  principle  *  to  estimate  u)  and  iv ^ 
simultaneously ,  since  they  can  be  separated  by  utilizing  again  this  time 
for  small  values  of  4,  From  now  on,  I  wu.1  net  go  into  technical  details,, 
except  a  few  points  of  ma]or  interest „ 

By  utilizing  k  and  again  applying  the  method  of  simple  ratio,  we 
cbta^r  ar.  equation  for  estimating  iv  of  which  the  error  function  is 
charactenaed  by  weights  of  the  following  form 

iv  *  t  l  ?  :  A !  I  1  *  a.4"1  z ' 

where 

?  (»  A ' w  ' 


Now  the  point  here  is  that  this  time  we  can  improve  this  method  tc 
some  extent  by  taking  a  look  at  the  error  function  of  the  least  square 
method.,  That  is,  even  though  the  lest  square  method  ’  tse*f  is  nor 
applicable 8  we  can  modify  the  error  function  of  a  suboptimal  method  so 
as  to  bear  mere  resemblance  to  the  error  function  of  the  least  square 
method..  As  a  result  of  this  kind  cf  modification,,  we  obtain 

uh.  *  ±  (T  ♦  X4  z)  , 

and  you  see  the  5  s  on  initial  triads  are  more  heavily  weighted  than 
be fore . 

Now  suppose  that  I  obtained  an  estimate  of  it  by  this  method  although 
this  is  actually  a  false  statement,,  Then  the  only  remaining  parameter  u 
can  be  estimated,,  for  the  first  time,,  directly  by  the  method  cf  least 

square , 

However,,  since  the  modified  method  of  simple  ratio  didn  t  work.,  I 
dropped  u  too,,  from  the  model  by  assuming  equal  to  w  /2  for  run  class  1 
and  attempted  tc  obtain  the  least  square  estimate  of  w ,  by  successive 
approximation,,  The  method  is  very  simple.  The  reason  why  ‘re  least  sq^a^e 
method  is  usually  impracticable  is  that  the  weight  cf  the  er'er  function, 

^  (Q]0  is  usually  a  fairly  complicared  function  of  6  But  r  wt  replace 

this  unknown  0  in  0 }  by  its  arbitrary  estimate  fl*r  ihen  the 

estimation  equation 

'  <0*1  6  (8  *  0 


is  often  solvable u  Then,  if  the  estimate  6  obtained  by  solving  this 
approximate  equation  is  considerably  different  from  d#,  you  will  replace 
9*  by  6  and  repeat  the  same  procedure,  although  I  think  repetition  is 
usually  unnecessary  since  it  is  easy  to  get  a  fairly  good  estimate  6*  to 
start  cut  just  by  a  trial  and  error  calculation. 

Anyway,-  this  method  again  failed  in  my  case.  Any  b>  now  The  reason 
for  ail  the  failures  is  clear.,  The  guessing  quotients  on  the  first  coup.e 
of  cycles  are  completely  beyond  the  descriptive  framework  cf  Decay  Model  II „ 
(For  a  probable  reason,  see  Toda,  1962c)  And  since  all  those  improved 
estimation  methods  give  heavy  weights  to  those  initial  trials  where  the 
theoretical  values  are  most  sensitive  to  the  variation  of  m  it  is  no 
wonder  that  I  ended  up  with  utterly  incomprehensible  estimates  of  M 

Anyway r  these  failures  led  me  to  an  entirely  new  line  cf  approach  I 
attempted  to  use  the  method  of  minimum  absolute  error,  that  is  tc  estimate 
parameters  by  minimizing  the  sum  of  absolute  errors  and  it  burned  out. 
that  this  method  is  very  usefulc  At  any  rate  the  method  of  minimum 
absolute  error  should  at  least  be  as  good  as  the  method  of  least  squares 
and  furthermore,,  it  has  a  very  nice  property  of  dirregardirg  exceptional 
da*a  values,,  But  this  does  not  mean  that  this  method  innocently  gives  us 
estimated  values  no  matter  how  exceptional  vaiues  may  exist  ir  the  da~a, 

On  the  contrary  it  gives  us  precise  information  through  the  course  ci 
estimation  about  which  values  are  exceptional  and  in  what  way  the)  are 
exceptional  Unfortunately,  1  have  no  time  to  go  into  details  cf  this 
methods  But  J  am  convinced  that  this  relatively  unknown  method  is  worth 
more  attention  by  the  users  of  stochastic  models 
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